THE LOCAL GEOMETRY OF FINITE MIXTURES 
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Abstract. We introduce a technique to obtain local (bracketing) metric en- 
tropy bounds for subsets of a normed vector space from global entropy bounds. 
Using this method, we establish that for q > 1, the class of convex combina- 
tions of q translates of a probability density has finite local doubling dimension 
under a smoothness assumption. The proof requires a detailed investigation 
of the local geometry of mixture classes, which is of independent interest. 



1. Introduction 

Let (X, d) be a metric space, and consider a subset T — : £ <G 5} of X that 
is parametrized by a bounded subset 5 of M d . Roughly speaking, we are interested 
in the following question: can T be viewed as a finite- dimensional subset of XI It 
is certainly tempting to think so, as the parameter set 2 is finite-dimensional, and 
this idea is easily made precise if the induced metric dr(£, £') = d(t^,t^i) on 5 is 
comparable to a norm on R d . However, there are natural examples where control 
by a norm is not straightforward, or even impossible. The aim of this paper is 
to develop a general method to address such problems, and to study in detail a 
prototypical problem that arises from applications in statistics. 

To set the stage for the problems that we will consider, let us recall some metric 
notions of dimension. For a subset T of a metric space {X, d), the covering number 
N(T, e) is the smallest cardinality of a covering of T by e-balls [15] : 



N(T, e) = inf \ n : 3 n G X, i = l,...,n s.t. 



n . 
TC \jB( Xi ,e) , 
»=i J 



where B{x,e) = {x' G X : d(x,x') < e}. The covering number, or equivalently the 
metric entropy logiV(T, e), quantifies the capacity of the set T, and its scaling in e 
is closely connected to dimension. Indeed, let | • | be a norm on R d , so that (M. d , | • |) 
is a finite-dimensional Banach space. A standard estimate [HI Lemma 4.14] gives 

for any e < 5, where B(t, 5) = {x G M d : \x —t\< 5}. This estimate has two trivial 
consequences: first, for any bounded T C (R d , | • |), there is a constant C\ so that 

(1.1) N(T,e)<(^- 
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for all e sufficiently small. On the other hand, if we fix a distinguished point to G T, 
there is a constant C2 such that for all e/S sufficiently small 



Either (jl.ljl or (|1.2[) may be used as a notion of finite-dimensionality for a set T in a 
general metric space (X, d): a set satisfying the global entropy bound has finite 
Kolmogorov dimension log iV(T, eV log(l/e) < d, while a set satisfying the local 
entropy bound (|1.2|) has finite locajj doubling dimension log N(TnB(to, 2e), e) < d. 
Clearly (|1.2p implies (jTTTJ) , but not conversely. 

Now consider a parametrized set T = {i^ : £ € S} in a metric space (X, d), 
where S is a bounded subset of R d , and let | • | be a norm on K d . As (S, | • |) is 
finite-dimensional in either sense (JTTTJ) or (|1.2p . these properties are inherited by 
T provided that the metric d is comparable to | • |. Indeed, if we have a Holder- 
type upper bound d(tc,tci) < C|£ — £,'\ a , then T satisfies the global entropy bound 
if we have in addition the lower bound d(tj, t^ ) > c|£ — £,o\ a , we obtain the 
local entropy bound ((TTJ) with t = t ?o The upper bound is easily obtained in 
many cases of interest, so that finite-dimensionality in the sense (|1.1[) is not too 
problematic. The lower bound is much more delicate, however. In its absence, 
finite-dimensionality in the sense (|1.2[) is far from obvious. 

Our guiding example, which is of significant independent interest, is a problem 
that arises from applications in statistics. Let us fix a probability density /o on M. d 
(that is, /o > and J /q dx = 1), and consider the class 



of convex combinations of q translates of /o, where O is a bounded subset of R d . 
Such densities appear in numerous applications, where they are frequently known 
as location mixtures. M 9 is a subset of the space M of all probability densities on 
K d , endowed with a suitable metric d. 

M, q is parametrized by the finite-dimensional subset E q — A g _i x O 9 of R qd+q ^ 1 } 
where A 9 _i is the g-simplex. Natural metrics d satisfy a Holder- type upper bound 
with respect to a norm on S g (e.g., step 2 in the proof of Theorem 13.11 below). 
However, the corresponding lower bound can be impossible to obtain. 

Example 1.1. We will write f$(x) = fo(x — 9) for simplicity. Fix 9* € O and let 
/* = fg*. Then /* £ M2, but /* is not uniquely represented by a parameter in S2: 



{(7T, 9) G S 2 : d(7Tl/ fll + 7T 2 fg 2 ,f) = 0} = 

{tt G Ai, 0i = 2 = 0*} U {tti = 0, 0i G 9, 2 = 9*} U {tti = 1, 0i = 9\9 2 G 6}. 



Clearly c? cannot be lower bounded by any norm on S 2 , as such a bound would 
necessarily imply that {(ir,9) G S 2 : d(mfg 1 + 7r 2 /e 2 ,/*) = 0} consists of a single 
point. Thus the above approach to (11.21) is useless here. 

1 The doubling (Assouad) dimension of a set T is defined as the supremum of the local doubling 
dimension sup e logA r (T n B(to,2e), e) with respect to to [2l 114] . For the purposes of this paper, 
we will consider mainly the local version of this concept where the point to is fixed. 

2 If d(t£ ,t^i) < C|£ — then any covering of S by balls of radius (e/C) 1//a yields a covering 



of T by e-balls, so that N(T,s) < N(3, (e/C) 1 ^) < {C'/e) d / a . If also (%,%,) > c\£ - ?o] 
then {£ e S : d(t ( ,t (o ) < 6} CHnBKo,(j/c) 1/Q ), so N (T D B(t fo , 5), e) < (C"(5/e) d / a . 



(1.2) 
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FIGURE 1. Let fg(x) = e" 2 ^' 9 ' 2 , /* = / . 5 , M 2 = {p/ fll + (1 -p)fe 2 : 
p,di,&2 G [0,1]}- The plots illustrate (a) the set of parameters (p, 61,62) 
corresponding to the Hellinger ball {/ G M2 : h(f,f*) < 0.05}; and (b) 
the parameter set {(p,0i,0 2 ) : N(p,6 1 ,6 2 ) < 0.05} with N(p,6 ll 6 2 ) = 
\p(6i -0.5) + (1 -p) (fe -0.5)| + |p(6»i -0.5) 2 + |(1 -p)(0 2 -0.5) 2 . The 
two plots are related by the local geometry Theorem 13.91 which yields 
c*N(p, $ u 62) < h(pf 01 + (1 - p)fe 2 , f*) < C*N(p, 61,62). 

The phenomenon illustrated in this example can be stated more generally. For 
/* G M g * such that q* < q (note that /* G M q as M q C M 9+i for all g), the 
subset of parameters S g (<5) C S g corresponding to the ball M. q (S) = {/ G M g : 
^(/) /*) < ^} behaves nothing at all like a ball in a finite-dimensional Banach space 
(see Figure [Ha)): indeed, the diameter of E q (S) is even bounded away from zero 
as 5 I- 0. There is therefore no hope to deduce a local entropy bound of the form 
(|1.2p for iV(M 9 (<5), e) directly from the corresponding bound for a finite-dimensional 
Banach space. This natural example provides a vivid illustration of the difficulty 
of establishing local entropy bounds in geometrically irregular settings. The goal 
of this paper is to develop an approach for the investigation of such problems. 

In section [21 we develop a useful technique to obtain local entropy bounds of 
the form (ll.2j) . This method is not specific to mixtures, and is developed in a 
very general setting. We are motivated by the fact that, as explained above, global 
entropy bounds of the form (| 1 . 1 [) are typically much easier to obtain in geometrically 
complex problems than local entropy bounds. The main results of this section, 
Theorems 12.41 and 12.61 allow to deduce a local entropy bound for a subset T of a 
normed vector space from a global entropy bound for a certain weighted set Dq 
associated to T. While the latter bound may be far from trivial to obtain, it can 
provide a significant simplification of the original problem. 

In section |3l we obtain local entropy bounds for the mixture classes M, q . For 
concreteness, we endow M, q with the Hellinger metric h(f,g) — \\\/J ~ ^/g\\h 2 i 
which is the relevant metric for statistical applications [3D1 ch. 7], [T8] (however, 
our results are easily adapted to other commonly used probability metrics — the 
total variation metric c?tv(/, 9) = ||/ — <7 1 1 i 1 , for example — using almost identical 
proofs). The main result, Theorem l3.3l provides an explicit bound of the form (|1.2j) 
for M q under suitable smoothness assumptions on /q. 
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To prove Thcorcm l3.3[ we first reduce the local entropy bound to a global entropy 
bound using the technique developed in section [2] To obtain this global entropy 
bound, however, we must develop a precise understanding of the local geometry of 
mixtures, which constitutes the main effort in the proof. The key result that we 
prove in this direction is Theorem 13.91 One consequence of this result, for example, 
is as follows: given a mixture /* = n ifs* > one can choose sufficiently small 

neighborhoods A\ , . . . , A q * of 0\ , . . . , q * , respectively, such that for any q > 1 and 
mixture / = X)?=i ^ifsn the Hellinger metric h(f, /*) is of the same order as 



9 

E ^+E 



9j<£Ai 



~E 

2 ^ 

Bj^Ai 



E 

e d £Ai 

(here Aq = M. d \(Ai U • • • U A q *)). This pseudodistance controls precisely the set of 
parameters in S g with density close to /*, see Figure [JJ for an example. 

Beside their intrinsic interest, the results in this paper are of direct relevance 
to statistical applications. Many problems in statistics and probability make use 
of estimates on the metric entropy of classes of densities: metric entropy controls 
the rate of convergence of uniform limit theorems in probability, and is therefore of 
central importance in the design and analysis of statistical estimators [UJ [201 HH] ■ 
Such applications frequently require a slightly stronger notion of metric entropy 
known as bracketing entropy, which we will consider throughout this paper; see 
section [2] In infinite-dimensional situations, the global entropy is chiefly of in- 
terest: global entropy estimates for various classes of probability densities can be 
found in [3TJ HH IS1 IH] • However, in finite-dimensional settings, global entropy 
bounds are known to yield sub-optimal results, and here local entropy bounds are 
essential to obtain optimal convergence rates of estimators §7.5]. In the case 
of mixtures, the difficulty of obtaining local entropy bounds was noted, e.g., in 
[121 IT5] . Applications of the results in this paper are given in [TTJ [TUJ . 

2. From global entropy to local entropy 

2.1. Definitions and results. We will consider two different notions of covering 
in normed vector spaces. The first is the classical covering by balls. 

Definition 2.1. Let (X, || • ||) be a normed vector space. For any subset T C X 
and e > 0, the covering number N(T, e) is defined as 



N(T,e) = 



mf in : 3 Xi e X, i = l,...,n s.t. T C [J B(x il e) >, 
^ »=i ' 



where B(x,e) = {x 1 € X : \\x - x'\\ < e}. 

The second notion that we will consider is covering by brackets (order intervals), 
which requires a lattice structure. We will work in the general setting of normed 
vector lattices (normed Riesz spaces, see [JJ for a basic introduction). 

Definition 2.2. Let (X, || • ||) be a normed vector lattice. For any subset TCI 
and e > 0, the bracketing number N(T, e) is defined as 

N(T,e)=Min:3h, Ul eX, \\ Ui - h\\ < e, i = 1, . . . , n s.t. T C (J^ttA 

where [/, u] = {x € X : I < x < u}. 
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Remark 2.3. In a normed vector lattice (X, || • ||), the covering and bracketing 
numbers are both well defined. As [I, u) C B(l, \\u—l\\), it is evident that N(T, e) < 
3\T(T, e) for any T C X and e > 0. Bounds on the bracketing number therefore 
imply bounds on the covering number, but not conversely. The finer covering by 
brackets is essential in many probabilistic and statistical applications [211 US HB] ■ 



Let (X, || • ||) be a normed vector space, and let us fix a subset T G X and a 
distinguished point to G T . Our aim is to obtain an estimate on the local covering 
(or bracketing) number N(TP\B(to, 5), e) that is polynomial in 5/e. As is explained 
in the introduction, such estimates can be much more difficult to obtain than the 
corresponding estimates on the global covering number N(T, e) that are polynomial 
in 1/e. Unfortunately, the latter is strictly weaker than the former. 

Nonetheless, global covering estimates can be useful. For any t ^ to, define 

d t = t~\\ r , A> = {deter, t^t }. 

\\t — to|| 

The main message of this section is that a local covering estimate for T can be 
obtained from a global covering estimate for the weighted class Do tl. As global 
entropy estimates can be much easier to obtain than local entropy estimates, this 
provides a very useful approach to obtaining local entropy bounds for geometrically 
complex classes. We will give two versions of our main result, one for bracketing 
numbers (Theorem 12 .4[) and one for covering numbers (Theorem I2.6[) . 

Theorem 2.4. Let (X, || • ||) be a normed vector lattice. Fix T G X and to G T, 
and let Do be as above. Suppose that there exist q, Cq > 1 and Eq > such that 

N(D Q ,e) < (~~^J f or ever V £ < £ o- 
Choose any d G X such that \dt \ < d for all t G T, t ^ to. Then 

X(TnB(t ,5),p)< i^y-j 
for all S,p > such that p/5 < 4 A 2||d||, where C = C (l V ||d||/4e )- 



Remark 2.5. Theorem 12.41 requires an upper bound d G X on \Do\, that is, Do 
must be order-bounded. But the assumptions of the Theorem already require that 
N(£>o,£o) < oOj which is easily seen to imply order-boundedness of Do. The latter 
therefore does not need to be added as a separate assumption. 

Theorem 2.6. Let (X, || • ||) be a normed vector space. Fix T G X and to G T . 

and let D be as above. Suppose that there exist q, Cq > 1 and Eq > such that 

( C \ q 

N(Do,e) < ( — J for every e < So- 



Then 

N(TnB(t ,S),p) < f — J 
for all 5, p > such that p/5 < 1, where C — Co/(l A 2eo). 



Remark 2.7. In the above results, a global covering bound for Do of order (1/e) 9 
gives a local covering bound for T of order (S/e) q+1 . It is instructive to note that 
this polynomial scaling cannot be improved. Indeed, let T be the unit (Euclidean) 
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ball in R 9+1 , and let to = 0. Then Do is the unit sphere in and therefore has 

Kolmogorov dimension q, but the covering number of B(0,5) is of order (S/e) q+1 . 

Remark 2.8. A natural question is whether a converse to the above results can be 
obtained. In general, however, this is not possible: the class Dq can be much richer 
than the original class T, as the following simple example illustrates. Let (X, || • |j) 
be an infinite-dimensional Hilbert space and let (ek)k>i be an orthonormal basis. 
Let T = {2~ k e k : k > 1}U{0} and t = 0. Then N(Tn B(t , 2~ r ), 2~ fe ) < k-r + 1 
for k > r, so N(T n B(t , S),e) < log 2 (8<5/e) < (86/e) 3 / 2 for all e/S < 1. But here 
we have Dq = {efe : k > 1}, so N(Dq, e) = oo for all e > small enough. 



The proofs of Theorems 12 .41 and 12 .61 are almost identical. We will give a complete 
proof for the bracketing version (Theorem [23]) in section [2~2l and briefly sketch the 
changes needed for its covering counterpart ( Theorem 12. 6p in section [ 



2.2. Proof of Theorem 12.41 The assumption implies that 

' Co 
eAs 



N(A),e) < ( -^—] for every e > 0. 



If e < ||d||/4, then 



< 1 V 



e A eo 4e 
We therefore have 

N(Do,e) < (£\ for every e < \\d\\/4, 

where C is as defined in the Theorem. This estimate will be used below. 

Fix e,S > and let N = N(D ,e). Then there exist l\, u\, . . . , Iff, un € X 
such that ||itj — h\\ < £ for all i = 1, . . . , JV, and for every t E T, t ^ to there is an 
1 < i < N such that k < d t < ui. Choose t e T such that r~"<5 < ||t-*o|| < r-' l+1 S 
(with r > 1 to be chosen later). Then there exists 1 < i < N so that 

(r-"^ A r-" +1 «i) 6 + t < t < (r' n Ui V r-™ +1 u i ) 5 + t . 

Note that 

||uir- n 5-lir- n *|| < r- n Se, 
\\ Ui r- n+1 S - k r- n+1 6\\ < r- n+1 Se, 

Wm r~ n+1 S - k r~ n S\\ < (r - l)r~ n 5 + r- n+1 5e, 
\\Ui r- n S - k r- n+1 S\\ <{r- l)r- ,l 5 + r - n+1 5e, 
where the latter two estimates follow from Zj < dt < Ui, \\dt\\ = 1, and 

( Ui - k) r- n 8 < Ui r- n+1 S - k r- n S ~d t (r- l)r- n S < ( Ui - k) r - n+1 8, 
( Ui - h) r- n S < Ui r- n S - k r - n+1 S + d t (r- l) r - n S < ( Ui - k) r- n+l 5. 
As \a V b — c A d\ < \a — c\ + \a — d\ + \b — c\ + \b — d\, we can estimate 

|| (r _n «i V r- n+1 Ul ) 5 - (r~ n h A r- n+1 k) 8\\ < 2(r - l)r~ n 5 + 4 r - n+1 Se. 
Therefore, we have shown that 

N({i e T : r- ,l 5 < \\t - t Q \\ < r- ,l+1 S}, 2(r - l)r- ,l 5 + A r - n+1 5e) < 7f(D , e) 
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for arbitrary e,5 > 0, r > 1, n € N. In particular, 

N({t e T : r~ n 6 < ||t - t || < r-" +1 <5}, p) < N(D , ^p/S - |(1 - 1/r)) 

for every S > 0, r > 1, n £ N, p > 2(r — l)r _n <5. 

Choose an envelope d € X such that |dt| < d for all f e T, t ^ t - Evidently 

i -r-"<5d<i<r;o + r-™(5d 

for all t e T such that ||i - i || < r~ n 5. Therefore 

N({t€T: ||t -t || <r-W(5},2r- ff 5||rf||) = l 

for all (5 > 0, r > 1, H > 0. Thus we can estimate 

X(TnB(t ,5),2r- H 6\\d\\) 

< 1 + N({t e T ■ r ~ n6 < P - toll < r- n+1 5}, 2r- H 5\\d\\) 

71=1 

<1 + E K^" 1 Nil - (1 - Vr)}/2) 

71=1 

whenever <5 > 0, r>l, H > such that ||d|| > (1 — l/r)r H . In particular, 
K(Tn.B(to,<5),2r- H ( 5||d||) < 1 + ^ K(D , r™"^ 1 ||d||/4) 

n=l 

whenever S > 0, r > 1, H > such that ||d|| > 2(1 — l/r)r H , where we have used 
that the bracketing number is a nonincreasing function of the bracket size. 
Now recall that 

7i(D ,e) < (^pj for every < e < ||d||/4, 
where q, C > 1. Thus 



N(Tr\B(t ,5),2r- H 6\\d\\) < 1 + ^ r ~^-i)Q ' 



r 

n=l 



2r 



-hi 



whenever <5 > 0, r > 1, i? > such that ||d|| > 2(1 - l/r)r H . But 

r r -M). < 1 < _J_ < M 4C 

^ - 1 - 1/r? - 1 - 1/r ~ 2(1 - l/r)r H 2r- H 

as r > 1 and q,C > 1. We can therefore estimate 

K(TnB(t ^),2r-^||d||) < 



8T_ - 
2(l-l/r)r ff \2r- H \\, 

whenever S > 0, r > 1, H > such that ||d|| > 2(1 - l/r)r H . 
We now fix 5, p > such that p/S < 4 A 2||e?||, and choose 

4 ff _ Iog(2||d|| 5/p) 



A — p/S 7 logr 



s 
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Clearly r > 1 and H > 0. Moreover, note that our choice of r and H implies that 
||d|| = 2(1 — l/r)r H and p = 2r~- ff <5||(i|| . We have therefore shown that 

K(TnB(t ,5),p)< i^y) 
for all S,p > such that p/S < 4 A 2||rf||. □ 



2.3. Proof of TheoremUm Fix e, S > and let N = N(D , e). Then there exist 
x±, . . . ,xn £ X such that for every t E T, t ^ to there is an 1 < i < N such that 
|| dt - a* || < e. Choose t £ T such that r- n S < \\t-t \\ < r~ n+1 8 (with r > 1 to be 
chosen later). Then there exists 1 < i < N so that 

\\t - t ~ x t r- n S\\ < ||* — to — dtr~ n <y|| + r- n 5\\d t - xt\\ < (r - 1)^5 + r' n Se, 

where we have used that ||dt|| = 1. Therefore, we have shown that 

N{{teT :r- n S < \\t - t \\ <r- n+1 S},p) < N(D Q ,r n p/S - r + 1) 

for every S > 0, r > 1, n € N, p > () — l)r~ n 5. On the other hand, clearly 

N({t GT: ||t- to|| <r-Wt},r- B 8) = l 

for all S > 0, r > 1, H > 0. The remainder of the proof follows along exactly the 
same lines as that of Theorem 12.41 and is therefore omitted. □ 



3. The local entropy of mixtures 

3.1. Definitions and main results. Let p be the Lebesgue measure on M d . We 
hx a positive probability density /o with respect to p (/o > and / fodp = 1), and 
consider mixtures (finite convex combinations) of densities in the class 

{f e :6eR d }, f e (x) = f (x - 9) V.xeM d . 

In everything that follows we fix a nondegenerate mixture /* of the form 

i=l 

Nondegenerate means that tt* > for all i, and 6** ^ 0* for all i ^ j. 

Let 8 C R d be a bounded parameter set such that {8* : i = 1, . . . , q*} C 0, and 
denote its diameter by 2T (that is, 9 is included in some closed Euclidean ball of 
radius T). We consider for q > 1 the family of q- mixtures 

f 9 9 1 

M 9 = ^TTi/e, : m > 0, ^TTi = 1, 6i e9 . 

^ i=l i=l J 

The goal of this section is to obtain a local entropy bound for M g at the point /*, 
where M 9 is endowed with the Hellinger metric 

-l 1/2 

Hf,g) 



J (VJ-v 7 ^) 2 ^ 



f,9 G M,. 



That is, we seek bounds on quantities such as Nh({f & M q : h(f,f*) < e},5), 
where Nh denotes the covering number in the metric space (M g ,/i) (i.e., covering 
by Hellinger balls). In fact, we prove a stronger bound of bracketing type. Our 
choice of the Hellinger metric and the particular form of the bracketing number to 
be considered is directly motivated by statistical applications [20j ch. 7], [HI §7.4]; 
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see [TTJ [10] for statistical applications of the results below. We will adhere to this 
setting for concreteness, though other metrics may similarly be considered. 

In the sequel, we denote by || • || p the L p (/*d/x)-norm, that is, \\g\\ p — J \g\ p f*d/j,. 

Note that the Hellingcr metric can be written as h(f,g) — \\\/ f / f* — \/g/ f*\\2- 
To obtain covering bounds for M g in the Hellinger metric, we can therefore apply 
the results of section [2] for the case where (X, \ \ ■ ||) is the Banach lattice L 2 (/*d/i), 
T = {"vfjf* '■ f G ^tg}: an d to = 1. Indeed, it is easily seen that 

N h ({f e M v : h(f,f) < e},26) < N(X q (e),6) < X(X q (s),6), 
where we have defined 

K,(e) = {^fW ■ f e M q , || V777 7 - 1|| 2 < s}. 
Our aim is to obtain a polynomial bound for the bracketing number N(!H 9 (£), S). 
To this end, we will apply Theorem 12.41 to the weighted class T> q defined by 



D, = R:/6M„ f^f*}, d f - xJ,r ' 



The essential difficulty is now to control the global entropy of D q . 
The following notation will be used throughout: 

H (x) = supf e (x)/f*{x), 
see 

H 1 {x) = sup max \df B {x)/d6 l \/ f* (x), 
eee «=i,.".a 

ff 2 (x) = sup max \d 2 fe(x)/dd i d6 j \/f*(x), 

#3(2;) = sup max \d 3 fe(x)/de i d9 j d6 k \/ f*(x) 

when /o is sufficiently differentiable, M = lj g >i -M-gi ancl = U q >i 
Assumption A. The following hold: 

(1) /o € C* 3 and fo(x), (dfo/dd l )(x) vanish as ||x|| -> oo. 

(2) i/ fc e L A (f*dfx) for fc = 0, 1, 2 and £T 3 e L 2 (f*dfi). 

We can now state our main result, whose proof is given in section f3. 31 

Theorem 3.1. Suppose that Assumption A holds. Then there exist constants C* 
and S* , which depend on d, q* and f* but not on 0, q or S, such that 

N(2M) < ^ C*(T V l)^(\\H a \\i VjHinv jjggjjj V jjgajjf) y 8(d+1)g 

/or aZ/ g > q* , S < 5* . Moreover, there is a function D e L i (f*dfi) with 

\\D\\ 4 < K*(\\H \\ 4 V WH^V \\Hz\U), 
where K* depends only on d and f* , such that \d\ < D for all d G D. 

Remark 3.2. Assumption A is essentially a smoothness assumption on Jo- Some 
sort of smoothness is certainly needed for a result such as Theorem l3.1l to hold: see 
§3] for a counterexample in the non-smooth case. 

The bound of Theorem l3.1l is of independent interest (such a bound was assumed, 
e.g., in [6l I16j without or with incorrect proof). On the other hand, combining 
Theorems 12.41 and 13.11 we immediately obtain a local entropy bound for M q . 
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Theorem 3.3. Suppose that Assumption A holds. Then 

/ r x 18(^+1)9+1 

for all q > q* and 5/s < 1, where 

C e =L*(TW l) 1 / 6 (\\H \\i V H^Ht V \\H 2 \\\ V \\H 3 \\l)^ 

and L* is a constant that depends only on d, q* and /*. 

To illustrate these results, let us consider the important case of Gaussian location 
mixtures, which are widely used in applications (see, e.g., [T2l [TBI [T5] ) . 

Example 3.4 (Gaussian mixtures). Consider mixtures of standard Gaussian den- 
sities f (x) = (2ir)- d / 2 e-W x W 2 / 2 , and let 9(T) = {9 e R d : ||0|| < T}. Fix a 
nondegenerate mixture /*, and define T* = maxi = i.... jg * \\8*\\- Denote by !H g (e,T) 
the Hellinger ball associated to the parameter set Q{T). Then 

X(X q (e,T),S)< (^^— 

for all q > q* , T > T* , and 5/e < 1, where C^CJ are constants that depend on 
d, q* and /* only. To prove this, it evidently suffices to show that Assumption A 
holds and that ||-fffc||4 for k = 0, 1, 2 and II-H3II2 are of order e CT . These facts are 
readily verified by a straightforward computation. 



Let us emphasize a key feature of Theorems 13. 1 1 and 13.31 the dependence of the 
entropy bounds on the order q and on the parameter set is explicit (see, e.g., 
Example l3.4|) . In particular, we find that for every /*, the local doubling dimension 
of M g at /* is of the same order as the dimension of the natural parameter set for 
mixtures A 9 _i x O 9 , which answers the basic question posed in the introduction. 
Obtaining this explicit dependence, which is important in applications is one 
of the main technical challenges of the proof. In order to show only that N(JCg(e), 6) 
is polynomial in e/S without explicit control of the order, the proof could be sim- 
plified and substantially generalized — see Remark 13.61 below for some discussion. 
In contrast to the dependence on q and 0, however, the proofs of Theorems 13.11 
and 13.31 do not provide any control of the dependence of the constants on /* . In 
particular, while we can control the local doubling dimension of JA q at /* in terms 
of q, we do not know whether the dependence on /* can be eliminated. 

Remark 3.5. We have not optimized the constants in Theorem 13.11 and Theorem 
13.31 In particular, the constant 18 in the exponent can likely be improved. On 
the other hand, it is unclear whether the dependence on the diameter of O is 
optimal. Indeed, if one is only interested in global entropy r H{'K q ,8) where J£ g = 
{v777* • / "= -M-g}) then it can be read off from the proof of Theorem 13.11 that the 
constants in the entropy bound depend on ||-ffo||i and |j-Hi||i only, which are easily 
seen to scale polynomially in T due to the translation invariance of the Lebesgue 
measure. Therefore, for example in the case of Gaussian mixtures, one can obtain 
a global entropy bound which scales only polynomially as a function of T, whereas 
the above local entropy bound scales as e CT . The behavior of local entropies is 
much more delicate than that of global entropies, however, and we do not know 
whether it is possible to obtain a local entropy bound that scales polynomially in 
T for the Hellinger metric. On the other hand, if JA q is endowed with the total 
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variation metric d,Tv(f,g) — J \ f — g\dfi rather than the Hellinger metric, then an 
easy modification of our proof yields a local entropy bound that depends only on 
\\Hi\\i (i = 0, . . . ,3), and therefore scales polynomially in T. In this case the scaling 
matches that of the global entropy, and is therefore optimal. 

Remark 3.6. The problems that we address in this section could be investigated 
in a more general setting. Let £F = {fg : 9 £ 0} be a given family of probability 
densities (where is a bounded subset of M. d ), and define 



The case that we have considered corresponds to the choice J = {/o( • —6) : 9 £ 0}, 
but in principle any parametrized family may be considered. 

Remarkably, most of the proof of Theorem 13 . 1 1 does not rely at all on the specific 
choice of £F, so that very similar techniques may be used to study more general 
mixtures. The only point where the structure of 2f has been used is in the local 
geometry Theorem 13.91 below, whose proof (using Fourier methods) relies on the 
specific form of location mixtures. We believe that essentially the same result holds 
more generally, but a different method of proof would likely be needed. 

The proof of Theorem 13.91 below is rather technical: the difficulty lies in the 
fact that the result holds uniformly in the order q. This is necessary in order to 
obtain bounds in Theorems 13.11 and 13.31 that depend explicitly on q. If the explicit 
dependence on q is not needed, then our proof of Theorem 13.91 can be simplified 
and adapted to hold for much more general classes see [TP] . 

Finally, we note that M = (J M g is simply the convex hull of 3 r . The problem 
of estimating the metric entropy of convex hulls has been widely studied [4j [TJ [HI 
[T2j [13] . In general, however, the convex hull is infinite-dimensional, so that this 
problem is quite distinct from the problems we have considered. 

3.2. The local geometry of mixtures. At the heart of the proof of Theorem 13 .11 
lies a result on the local geometry of location mixtures, Theorem l3.9l below. Before 
we can develop this result, we must introduce some notation. 

Define the Euclidean balls B(6, e) = {& £ R d : \\9-9'\\ < e}, denote by (u, v) the 
inner product of two vectors u, v £ W 1 , and denote by (A, u) = {(6,u) : 9 £ A} C K 
the inner product of a set A C R d with a vector u £ K d . 

Lemma 3.7. It is possible to choose a bounded convex neighborhood Ai of 9* for 
every i = 1, . . . , q* such that, for some linearly independent family u\, . . . , Ud £ M. d , 
the sets {(Ai, Uj) : i = 1, . . . , q*} are disjoint for every j = 1, . . . , d. 

Proof. We first claim that one can choose linearly independent u\, . . . ,Ud such that 
\{(9*, uj) : i = 1, . . . , q*}\ = q* for every j = 1, . . . ,d. Indeed, note that the set 
{u £ R d : \{(9*, u) : i = 1, . . . , q*}\ < q*} is a finite union of (d — l)-dimensional 
hyperplanes, which has Lebesgue measure zero. Therefore, if we draw a rotation 
matrix T at random from the Haar measure on SO(d), and let Ui — Tei for all 
i = l,...,d where {e%,...,ed} is the standard Euclidean basis in R d , then the 
desired property will hold with unit probability. To complete the proof, it suffices 
to choose Ai = £>(#*, e/4) with e = min^min^ \ (9* — 9* ,Uk)\. □ 

We now fix once and for all a family of neighborhoods Ax,..., A q * as in Lemma 
13.71 The precise choice of these sets only affects the constants in the proofs below 
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and is therefore irrelevant to our final result; we only presume that At, ... , A q * re- 
main fixed throughout the proofs. Let us also define Ao = M. d \(AiU- ■ -LiA q *). Then 
{Aq, . . . , A q * } partitions the parameter set R d in such a way that each bounded el- 
ement Ai, i = 1, . . . , q* contains precisely one component of the mixture /*, while 
the unbounded element Ao contains no components of /*. 
Let us define for each finite measure A on M. d the function 

fx(x) = J fe(x)\(d6). 

We also define the derivatives D x fg(x) G M d and D 2 fg{x) G R dxd as 

d d 2 

Denote by %$(A) the space of probability measures supported on A C R d , and 
denote by Mf the family of all d x d positive semidcfinitc (symmetric) matrices. 

Definition 3.8. Let us write 

® = {(r),/3,p,T,v) : m ,..., Vq < 6t, /3i,.. .,/V GK d , p x ,...,p q + G M d , 

71),..., V >°> voeWAa),...,*? e¥(A q »)}. 
Then we define for each (rj, /3, p, t, v) G 3) the function 

P , r, „) = ro f -p + {v ijl + Pt + Tr 

and the nonnegative quantity 

N(r], p, p , t,v) = t + J2 h + n\ + 

i=X i=l 

ETr[ Pl ]+^| f WB-Btfviidff). 

i=\ i=l J 

We now formulate the key result on the local geometry of the mixture class M. 

Theorem 3.9. Suppose that 

(1) fo G C 2 and fo(x), D\fo{x) vanish as \\x\\ — >• oo. 

(2) ||[£» 1 / y/*||i<oo and \\[D 2 fo] ij lf*\\x<^ for alii, j = l,...,d. 
Then there exists a constant c* > such that 

\\l{n,p,p,T,v)\\x > c* N (rj, ft, p, t, v) for all (n,ft, p,r,v) G £>. 

[TTie constant c* may depend on f* and Ax, ... , A q * but not on r\, ft, p, t, v\ 

Before we turn to the proof, let us introduce a notion that is familiar in quantum 
mechanics. If (fi, E) is a measurable space, call the map A : £ — > M. dxd a s£a£^f| if 

(1) A i — ^ [A(j4)]y is a signed measure for every i, j = \, . . . ,d; 

(2) X(A) is a nonnegative symmetric matrix for every A G E; 

(3) Tr[A(0)] = 1. 



a Our terminology is in analogy with the notion of a state on the C*-algebra C dxd (g) Cc(fl), 
where H is a compact metric space and Cc(Q) is the algebra of complex- valued continuous func- 
tions on £7. Such states can be represented by the complex-valued counterpart of our definition. 



D2fe 
f* 



+ n 



f* 



Pi 
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It is easily seen that for any unit vector £ £ M. d , the map A i-> (£, A(A)£) is a 
sub-probability measure. Moreover, if £i,...,£d € R d are linearly independent, 
there must be at least one £j such that (£,-, A(fi)£j) > 0. Finally, let B C R d be 
a compact set and let (A n )„>o be a sequence of states on B. Then there exists a 
subsequence along which A n converges weakly to some state A on B in the sense 
that / Tr[M(0)A„(d0)] -> / Tr[M(0)A(d0)] for every continuous function M : B -> 
R dxd . To see this, it suffices to note that we may extract a subsequence such that 
all matrix elements [X n ]ij converge weakly to a signed measure by the compactness 
of B, and it is evident that the limit must again define a state. 



Proof of Theorem \3.9l Suppose that the conclusion of the theorem does not hold. 
Then there must exist a sequence of coefficients (r/ n , j3 n , p n , t™ , v n ) £ D with 

\\£( V n 1 /3 n ,p n ,T n ,iy n )\\ 1 „_>«,. 



N(r) n ,f3 n ,p n ,T n ,v n ) 

Let us fix such a sequence throughout the proof. 
Applying Taylor's theorem torn-) fe*+u(G-e*) 



U0. 



we can write for 



nJJl 



■Pi 



Dife 
f* 



Tr 



D 2 fg 



f* 



M 
f* 



(e-et)K{de) 



*D 1 f e 



Tr 



1 D2fe*+u(6-6l) 



Tr 



D2fe 
f* 



2(1 -u)du\ A? {dff) 



where A™ is the state on A{ defined by 

(it is clearly no loss of generality to assume that vf has no mass at 9* for any i, n, 
so that everything is well defined). We now define the coefficients 



for i = 1, . 



Note that 



N(r) n ,f3 n ,p n ,T n ,i/ n ) ' 

a r_L 

~~ N(r] n ,f3 n ,p n ,T n ,v n )' J 
, q* , and 

a" 



d" 



N{<q n ,f3 n ,p n ,T n ,v n ) : 
N(?i n ,(3 n ,p n ,T n ,is n ) 



'o 



N(r) n ,(3 n ,p n ,T n ,v n ) 



q 



KI + > iK| + ||6?||+Tr[c?] + |c?|} = l 



for all n. We may therefore extract a subsequence such that: 

(1) There exist a { £ R, 6, £ R d , c t £ Mf, and a ,di > (for i = 1,. . .,<?*) 

with |o | + X)?=i {|o»| + \\bi\\ + Tr[a] + \di\} = 1, such that a£ a and 
a™ — t> a,;, 6" — >• 6,;, c™ — >• c*, — > di as n — > oo for alH = 1, . . . , q* . 

(2) There exists a sub-probability measure v$ supported on Aq, such that 
converges vaguely to Vq as n — > oo. 
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(3) There exist states A, supported on c1j4, for i = l,...,q*, such that A™ 
converges weakly to Ai as n — > oo for every i = 1 , . . . , q*. 
The functions Krf 1 , (3 n , p n , r™ ,i/ n )/N(rj n , (3 n , p n ,r™ , v n ) converge pointwise along 
this subsequence to the function h/ f* defined by 

h = a f vo + ^2 \ a i + b i D ifet + Tr [ c i D 2fe*] 
i=i ^ 

+ d t jTi j jf D 2 f et+H0 _ et) 2(l-u)du^Xi(dd) J. 

But as ||^«,/3 ,l , j o",r ,l ,^")||i/iV(?7",/3™,p",T Tl ,^") ->■ 0, we have = b Y 

Fatou's lemma. As /* is strictly positive, we must have h = 0. 
To proceed, we need the following lemma. 

Lemma 3.10. The Fourier transform F[h}(s) :— J e^ x,s 'h(x)dx is given by 



a J MdO) + £ U e'W-') + i(b u s) e^^ 

- (s, c lS ) e l < e *^ - d t jW> s ) [ <j>(i{6 - flf, a)) (s, X t (d9)s) 



F[h](s) = F[f ](s) 



for all s £ M. d . Here we defined the function 4>{ u ) — 2(e u — u — l)/u 2 . 

Proof. The ai,bi, Ci terms are easily computed using integration by parts. It remains 
to compute the Fourier transform of the function 

[Si(x)]j k = J | ^ [D2f9*+u(e-e*)(x)]jk 2(1 - u) du\ [Xi(d9)] kj . 
We begin by noting that 

ill ^ D2 f e t +u( - 9 ~ 6 i^ X ^ jk \ 2 ^ ~ U ) dudx \\^i\kj\{dO) = 

\\[K]kj\\iw J \[D 2 fo(x)]jk\dx < oo. 
We may therefore apply Fubini's theorem, giving 

F[[S i ] jk }(s) = -F[f }(s)s j s k e i ^^ f ( j e iu ^^2(l - u)du\[Xi{d9)] kj 



= -F[f }(s) s jSk jW*) j <t>(\{6 - 91 s)) [Xi(d9)] kj , 

where we have computed the inner integral using integration by parts. □ 

Let u%, . . . , Ud G K d be a linearly independent family satisfying the condition of 
Lemma I3J1 As F[h](s) = for all s € R d , we obtain 



¥{it) := a + (? m ' ue) {ai + it(b u u e ) - t 2 (u e , c t u t ) - d t t 2 = 

»=i 

for all £ = 1, . . . , d and t G [—6, t] C K for some t > 0, where we defined 
Sftit) = [</>(it{0 - 9*,u e )) (ut,\i(d0)u £ ) 
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for i = 1, . . . , q*, and 

Indeed, it suffices to note that F[/ ](0) = 1 and that s H> F[f ](s) is continuous, so 
that this claim follows from Lemma [3.101 and the fact that F[f ](s) is nonvanishing 
in a sufficiently small neighborhood of the origin. 

As all Xi have compact support, it is easily seen that for every i = 1, . . . , q*, the 
function $f (z) is defined for all z £ C by a convergent power series. The function 
SSf e (it) := ¥(\t) - a <I>o(ii) is therefore an entire function with |^(,z)| < kie k2 ^ 
for some k\, fc 2 > and all z £ C. But as ¥{it) = for t £ [— t, t], it follows from 
[PTj . Theorem 7.2.2 that oq $o(it) is the Fourier transform of a finite measure with 
compact support. Thus we may assume without loss of generality that the law of 
(9, ui) under the sub-probability i/q is compactly supported for every £ = 1, . . . , d, 
so by linear independence vq must be compactly supported. Therefore, the function 
¥(z) is defined for all z £ C by a convergent power series. But as <& l (z) vanishes 
for z £ i[— l, i], we must have <£> £ (z) = for all z £ C, and in particular 

9* 

(3.1) ¥{t) = a ¥ {t) + J2 e m ' Ue) + *<&i. ui) + t 2 (u e , cm) + d t t 2 $f (t)} = 

»=1 

for all t £ R and £ = 1, . . . , d. In the remainder of the proof, we argue that (13. ip 
can not hold, thus completing the proof by contradiction. 

At the heart of our proof is an inductive argument. Recall that by construction, 
the projections {(A;, ui) : i = 1, . . . , q*} are disjoint open intervals in R for every 
£ = l,...,d. We can therefore relabel them in increasing order: that is, define 
(£l),...,(£q*) £ {1,...,?*} so that (0f ny u t ) < (0{ a) ,ut) <■■■< (0f v)) u*>. The 
following key result provides the inductive step in our proof. 

Proposition 3.11. Fix £ £ {1, . . . , d}, and define 

¥ (t) :=a + ^ a; e'< e >'>. 

i=l 

Suppose that for some j £ {1, . . . , q*} we have $> e ' J (t) = for all t £ R, where 
j 

¥>i(t) := ¥ (t) + e^U)^) {t{b m ,u t ) + t 2 (u e ,c {u) u e ) + d {H) t 2 <f e {u) (t)}. 

i=l 

Then d^j)(ut,X^j)(R d )ut) = 0, (ug,c^ut) = 0, and (byj^ui) = 0. 

Proof. Let us write for simplicity 6f = (9*,ui), and denote by Xf and v$ the fi- 
nite measures on R defined such that J f{x)X i {dx) = J f ((9 , ui))(u£, Xi{d8)ui) and 
/ f(x)v^{dx) = j f((0, ui))v[){d9), respectively. For notational convenience, we will 
assume in the following that (£i) = i and ^({df}) = for all i = 1, . . . , q*. This 
entails no loss of generality: the former can always be attained by relabeling of the 
points 9*, while $q is unchanged if we replace Vq and Oj by Vq{ ■ n R\{0f, ■ • ■ , 9 l q * }) 
and di + ao Vq{{Q{Y), respectively. Note that 

(Ai , ui) = ]9f- , 0f + [, where < 0f < < 0^ for all t 

by our assumptions ((Ai,ui) must be an interval as Ai is convex). 
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Step 1. We claim that the following hold: 

dj = for all i > j + 1 and a>o ^od^j+i' 00 D = 0- 
Indeed, suppose this is not the case. Then it is easily seen that 

where we have used that v$ has no mass at {0{, ...,0**}. On the other hand, as 
is positive and increasing and as A, is supported on cl Ai, we can estimate 

+2 a t6' ff.tr 



< * et '^ {t) < t 2 e -*(^+i-»f) 4>(t{0^ - 0j}) Xi(R) ^ 
g t8 j+i J 

for i = 1, . . . , j. But then we must have 



\&*(t)\ 



= liminf v > 0, 

t^oo e t0 j + l 



which yields the desired contradiction. 

Step 2. We claim that the following hold: 

dj-A^([^,oo[) = 0, (ue,CjUt} = 0, and a ^([^, oo[) = 0. 

Indeed, suppose this is not the case. As Vq({9j}) = 0, we can choose e > such 
that ^o{[9j +£, oo[) > Vo([0j, oo[)/2. As do, rfj > 0, and using that </> is positive and 
increasing with </>(0) = 1 and that e £t > (et) 2 /2 for t > 0, we can estimate 

a *o(t) + e te i{t 2 {ue,c jUe ) + d 3 t 2 > 

i 2 e^ { jao^([^,oo[) + (^, Cj ^> +d,A*($,oo[)} > 

for all t > 0. On the other hand, it is easily seen that 



f 2 



5] e 40 - {a, + *<6i, u/> } + ^ e* 9 * {t 2 (^, CjU<) + t 2 ¥ t (t)} 

_i=l i=l 

But this would imply that 

= lim W 



a *$(*) + e te ,{t 2 ( U ,, CjU ,) + dj t 2 *<(*)} 

which yields the desired contradiction. 

Step 3. We claim that the following hold: 

dj \%[6]- , 0] [) = and a , 9$ [) = 0. 

Indeed, suppose this is not the case. We can compute 



°=dT 2 



^ {^f) =<hj e^H*(d9)+a j e^{0-6Z) 2 4{de) 



+ H^2 e-^-^iai + t(bi,u e ) + t 2 (u e , Ci ue) + d, t t 2 
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where the derivative and integral may be exchanged by |22j . Appendix A16. We 
now note that as ao,dj > 0, we can estimate for t > 



W X^dB) +a J e^-oP (9 - B\f v e (d6) > 
e^--°f){ dj Xt([9t-,8l[) + a 



J J 



On the other hand, as (e x — l)/x is positive and increasing, we obtain for t > 



-t(ej--ej) 



9 l 3 - eff 



2(e* -ef) 



\\{de)+ I e***-^ xi(de) 



3 t(fl*+-flf) _ 1 



which converges to zero as t — > oo for every i < j. It follows that 



+ e^ + -^K(R), 



0= lim 



ill 



(<&^(t)/ 



1, 



which yields the desired contradiction. 

Step 4. Recall that Aj is supported on [6^~ , ^ + ] by construction. We have 
therefore established in the previous steps that the following hold: 

dj(ue, Xj(R d )ui) = (u£,CjU£) = a a f o ([0j~, oo[) = 0, a, = for i > j. 

It is therefore easily seen that 



<F> J '(t) 

= , lim W = ( b 3> U t) 



Thus the proof is complete. 



□ 



We can now perform the induction by starting from (|3.1I) and applying Proposi- 
tion [3TTT] repeatedly. This yields dj{u^,Xj{S, d )up) — (ug,CjUi) = (bj,ug) = for all 
j = 1, . . . , q* and I = 1, . . . , d. As u±, . . . , Ud are linearly independent and Cj £ M d , 
this implies that bj — 0, Cj — and dj = for all j = 1, . . . , q*, so that 



«o 



for all s £ M. d (this follows as above by Lemma [3.101 h = 0, F[/o](s) ^ for s 
in a neighborhood of the origin, and using analyticity). But by the uniqueness of 
Fourier transforms, this implies that the signed measure eto vq + J2i=i a i ${0*} nas 
no mass. As v is supported on A , this implies that dj = for all j = 1, . . . , q*. 
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We have therefore shown that a,, 6j, Cj, di = for all i = 1, . . . , q*. But recall that 

N + E^ilN + INI + Tr M + Mill = 1, so that evidently a Q = 1. 
To complete the proof, it remains to note that 

9* 



N(r] n ,f3 n ,p n ,T n ,v n ) 
But this is impossible, as 

£(r] n ,/3 n ,p n ,T n ,P n ) 



f * dlI = J2a?^l. 



i=0 



^0 



a 



N(r] n ,(3 n ,p n ,T n ,v n ) 
by construction. Thus we have the desired contradiction 

3.3. Proof of Theorem 13.11 The proof of Theorem 13.11 consists of a sequence of 
approximations, which we develop in the form of lemmas. Throughout this section, 
we always presume that Assumption A holds. 

We begin by establishing the existence of an envelope function. 

Lemma 3.12. Define S = (H + H x + H 2 ) d/c*. Then S G L 4 (f*dp), and 

I///* - 1 



II///* -111 



< S for all f e M. 



Proof. That S 6 L 4 (f*d/j,) follows directly from Assumption A. To proceed, let 
/ € M q , so that we can write / = J2i=i n ifoi- Then 



/ - f* 
f* 



E ^f+EU E + E 



hj - fe 



i=l (. \j:8j£Ai 



Taylor expansion gives 

f 9] {x) - fg t ( x ) = (e, - e*yD 1 fg t (x)+ 

2 

Using Assumption A, we find that 



/* ' ^ " 3 f* 



j ~ e*)*D 2 f et+u{e ._ 0t) {x) fa - 6*) 2(1 - u) du. 



f-r 



/* 



< 



E ^+E 



E 



j-Sj&Ai 



E 

j-.OjeA, 



\ E ^ 



Q*l|2 



2 j II J 



(H + H 1 +H 2 )d. 



On the other hand, Theorem l3.9l gives 

E ^+E 

j:6j£A i=l 



/-/* 


> c* 


/* 


1 



E tj-t* 



I E ^ 



j-.Bj&Ai 



E 

The proof follows directly. 

Corollary 3.13. \d\ < D for all d G CD, where D = 2S G L 4 (f*dfi). 



3*11 2 



□ 
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Proof. Using ||/ - /*|| TV < 2h{f, /*) and - 1| < \x - 1|, we find 



where we have used Lemma 13.121 



< 2S*, 



□ 



Next, we prove that the Hellinger normalized densities df can be approximated 
by chi-square normalized densities for small /*). 



Lemma 3.14. For any f £ M ; we have 



V77r-i ///* - 1 



<{4\\S\\lS + 2S 2 }h(f,n, 



where we have defined the chi-square divergence x 2 (f\\.f*) — II///* ~~ 1|||- 
Proof. Let us define the function R as 



■/__,!/ /- /* 
/* 2\ /* 



i? 



Then we have 

VfTF-i ///*-i 



///* - 1 + i? 



///* - 1 



M/,/*) vWII/*) ll///*-i + ^ll2 ll///*-i|| 2 

(//r - i + R){\\f/r - lib - ii///* - 1 + + mf/r - 1 + Rh 

||///*-l + J R|| 2 ||///*-l|| 2 
so that by the reverse triangle inequality and Corollary 13.131 

y/77F-i fir -i 



h(f,n 

Now note that for all x > —1 



vW) 



< 



2||fl|| 2 5+|fl| 
ll///*-l|| 2 ■ 



X 2 f-v/l + X - l) 2 , X 

— < - vv i- = vT+i- 1 - - < o. 

2 - 2 2 ~ 



Therefore, by Lemma T3. 121 



\R\ < 



f - /* 
/* 



< S 2 



/-/* 



< s 2 



/-/* 



/* 



/ - /* 



/* 



The proof is easily completed using ||/ — /*||tv < 2/i(/, /*) 
Finally, we need one further approximation step. 



□ 



Lemma 3.15. Let q £ N and a > 0. Then for every f £ M q such that h(f, /*) < a, 
it is possible to choose coefficients rji £ M, j3i £ Mr, pi £ for i = 1, . . . , q* , and 
7i > 0, 9i £ for i = 1, . . . , q, such that Y^i=i ran k[pi] < <7 A tig*, 
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and 



where we have defined 



fir - 1 



J3/2 /o 



<=E^+^+^ 



D 2 fe 



9 f 
3=1 1 



Proof. As / € M 9 , we can write / = Xw=i ^i/fl? ■ Note that by Theorem 



i=l j-.Bj^Ai 

Therefore, h(f,f*) < a implies -Kj\\9j — 6*\\ 2 < 4a/c* for 9j 6 Ai. In particular, 
whenever 6j S A,-, either 7Tj < 2y/a/c* or - #*|| 2 < 2^/a/c*. Define 

i— 1,. . . ,g* 

Taylor expansion gives 

/,» - f et (x) = (6, - OtYD^ix) + ±(6j - e*)*D 2 f et (x) (9, - 6*) + R^x), 
where |%| < id 3/2 ||^ - 6£|| 3 # 3 . We can therefore write 

f }f = L +i2 E 



i=l je.J:8j£Ai 



where we have defined 

9* 



j "i . 



i=l k \j£J:0jEAi 



jEJ-.O-jEAi 



Now note that 

sir - 1 



Vx 2 (f\\r) \\ L h 



< 



i//r-i| n//r-i-L|| 2 , i///* - 1 - i| 



< 



ll///*-l||a ||£|| 2 ||£||a 

\\f/r-i-L\\ 2 s + \f/r-\-L\ 



\\Lh 

where we have used Lemma l3.I2l By Theorem 13.91 we obtain 
||i|| 2 >||L||i>yE E 



i=l jeJ-.BjEAi 



Therefore, we can estimate 



\f/f* - 1 - L[ < rf 3/2 ^3 ELi T^e.J:e 3 eA t *j 



3*113 



l|i|| 



< 



4a\ 1/4 d 3 / 2 F 3 



4a V 

c 7 / 3c* 
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where we have used the definition of J. Setting i = L/\\L\\2, we obtain 
///* - 1 



Vx 2 (f\\n 



j3/2,/9 



It remains to show that for our choice of I = L/||i||2, the coefficients 7 in 

the statement of the lemma satisfy the desired bounds. These coefficients are 



Vi 



71 j 1 ]? J 



\\L\\ 



Clearly rank[pi] < #{j : 9j 6 Aj} A d, so Xh=i rank[pi] < q A (ig*. Moreover, 



l|i||2>C* 



E **+£ 



E 

j^eAi 



2 5^ 



3* II 2 



by Theorem |3J3 It follows that X^ = i Tr^] < 1/c*. Now note that for j ^ J such 
that 9j £ Ai, we have \\9j ■ — 9*\\ 2 > 2y / a/c* by construction. Therefore 



> c* 



E Ti + sE E 



> (Vc*a A c*) 7Tj-. 



It follows that X)?=i I Til — l/(v / c*« A c*). Next, we note that 



E 


E ^ 




9* 

^E 


E 


+ E ^ 


i=l 


jeJ-.BjGAi 








jgJ-.ejgAa 



Therefore X)i=i l^l — l/ c * + l/V c*a. Finally, note that 



E 



E 

j&J:9jeAi 



Q 

^E 

i=i 



E 

j-.OjGA, 



jgJ-MjgAo 



Therefore £? =1 HAH < V c * + 2T/V&a. The proof is complete. 

We can now complete the proof of Theorem 13.11 
Proof of Theorem \3.1l Let a > be a constant to be chosen later on, and 
V q , a = {d f /*, h(f,n < a}. 

Then clearly 

X(V q ,6) < N(B^ a , S) + W(D,\D,, a> 5). 
We will estimate each term separately. 
Step 1 (the first term). Define 

M q = {(mi, . . . , m q *) eZ' : mi + • • • + m q * = q A rfg*}. 



□ 
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For every m £E M q , we define the family of functions 



q,m, a, 



f* 



3 = 1 



f* 



f* 

3 = 1 J 

(»7,/3,p,7,e) € 3 q , m , a 



where 



J q,m,a 



(77, 0, p, 7, 9) € K 9 * x x (M d ) mi x • • • x xR«x9«: 

5>|<I+ * ^|| ft ||<I + J^, 

f-f c* % /c*a f-f c* Vc*a 

1=1 J=l 9=1 v J 



Define the family of functions 



From Lemmas 13.141 and !3.15[ we find that for any function d G there exists a 

function ^ G L q _ a such that (here we use that h(f, /*) < v2 for any /) 

_ J3/2 /o 

|d - *| < {4||S|| 2 S + 25 2 } (a A 72) + — {||ff 3 || a S + #3} « 1/4 - 
Using a A \/2 < 2 3 / 8 a 1 / 4 for all a > 0, we can estimate 

'1 + M2 



|<2-£| < a 1/4 [/, {/ 



, , 8\\S\\l + Ajd 3 / 2 {S + S 2 + H 3 }, 



where U £ L 2 (f*dn) by Assumption A. Now note that if mi < £ < m 2 for some 
functions mi, m2 with ||m2 — mi||2 < £, then mi — a 1 / 4 U < d < rri2 + a 1 / 4 J7 with 
||(m 2 + a 1/4 C/) ~ (mi ~a^ 4 U)\\ 2 < e + 2a 1 / 4 \\U\\ 2 . Therefore 

K(2\ Q ,£ + 2a 1 / 4 ||£/|| 2 ) <N{£ q , a ,e) < ^K(£ gim , Q ,e) for e > 0. 

mEM, 

Of course, we will ultimately choose e, a such that e + 2a 1 / 4 ||[/||2 = 5. 

We proceed to estimate the bracketing number N(£q. miQ , e). To this end, let 
g,m,Q5 where £ is defined by the parameters (77, /3, 'y, $) € 3q,m,a 

and is 

defined by the parameters (77', /?', p',7', 0') £ 3 g ,m,o!- Note that 



EE 

i=i j=i 



Pa— fir Pa ~ (/>«) —fir Pa ^-^ H ^l^l^ ms ~ PaW- 

3 3 V i=l 7 = 1 
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We can therefore estimate 



i — l i—1 j— 1 

~ q* mi 1 1 / 2 

E E wpij-p'ijW 



£— Hl max Pj -e'\\ + 2 -^H 2 



. »=i i=i 

where we have used that |/g — fe'\/.f* < ||0 — by Taylor expansion. 

Therefore, writing V = (H a + Hi + H 2 ) dy/dq*, we have 

\£-£'\<v in {r,, (3, p, 7 , 0) - (r/, /?', p', y, 0')IIU, a , 

where |||-||l m „ is the norm on R(i+d)q*+d(qA,d q *)+(i+d)q defined by 



0* 



IK*?, A 7, f)|||,, m , Q = E N + E II All + E M 

i—l i—l j= 1 



1 2 

= max 1 1 j 1 - 



'c*a A c* j=i.-".9 Vc 



L i=i i=i 

Note that if \\\(ri, (3, p,j,9) - (r]', p', 7', 0')lll 9 ,m,a ^ e '> tnen wc obtain a bracket 
f - e'V < ^ < e + e'V of size + e'V) - (£' - s'V)\\ 2 = 2e'\\V\\ 2 . Therefore, if 
we denote by N{3q, m ,a, ||H|| 9 m Q , e') the cardinality of the largest packing of 3 q , m ,a 
by e'-separated points with respect to the |||-||| m Q -norm, then 

N(£ g , m ,a,e) < iV(V,«, 1-Ill g , m ,a. e / 2 II^II2) fOT £ > 0- 
But note that, by construction, 3q tm . a is included in a |||-||| m Q -ball of radius not 
exceeding (6 + 3T)/(Vc*aAc*). Therefore, using the standard fact that the packing 
number of the r-ball B{r) = {x e B : |||x||| < r} in any n-dimensional normed space 
(B, HI-HI) satisfies N(B(r), |||-|||,e) < (^) n , we can estimate 

(l+d)g*+d(<ZA<V) + (l+d) 9 

/ — I— I I I 1 --> 1 I i - ■ . \ I ■ I \ I ■ I I / f " : ' \ 

N(£,, m ,„,e) < 



4||F|| 2 (6 + 3r)/(Vc^Ac*) +e 
In particular, if e < 1 and a < c*, then 

Finally, note that the cardinality of M q can be estimated as 

#M 9 = (** + * A - ^ < /V + gAdg*-l\ 

^ q V gArff / - V qAdq* ) 

where we have used that q > q* . We therefore obtain 

N(D g , a ,5)< e n^q^s-^^wuh) 

meM q 

( 24(2 + T)\\V\\ 2 /V¥+V¥ ^ 3{d+1)q 
~\ (5 - 2a^\\U\\ 2 )^ 
whenever Klanda< (5/2\\U\\ 2 ) 4 A c*. 
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Step 2 (the second term). For /, /' e M g with h(f, /*) > a and /*) > a, 



,, | _ KV777 7 - ^ily/TTT 7 - 1|| 2 - (V777 7 - i)ll V7V7^ - iy 
" /l_ Kf,nh(f',n 



< \\VJW*-VW^\2\VW*-A + ^\VW*-VJW\ 
— 9 

cr 

where we have used that /*) < y/2 for any /. Now note that 
\\fa - Vb\ 2 < \y/a- Vb\ (y/a + Vb) =\a-b\ 



for any a,b > 0. We can therefore estimate 

ii(/ - f )//iii /2 (%/So +1) + V2 \(f - nm' 2 



\df-d' f \ 



< 



or 

where we have used that | \J f / f* — 1| < y/Ho + 1 for any / £ M. Now note that if 
we write / = J2i=i ^ifo, an( i /' = Y11=i ^if S'i then we can estimate 



/* 



< HoJ" \ni - wll + HxVd max ll^-^l 

* — ' i— 1,...,<7 



■ 9 

i— ± 

Defining 

= (V^o + l)||flo + H^Wl 12 + V2(H + HiVd) 1 / 2 , 

we obtain 

\df -d)\<- 2 |||M) - W,tf)fJ\ = £ 1^1 + max ||0,|| 

CM y ^ — ' 1=1,. ...q 

1=1 

(clearly |||-||| defines a norm on R( d+1 ^). Now note that if |||(7r,0) - (n',6')\\\ q < e, 
then we obtain a bracket d' f - e 1/2 W/a 2 < d f < d' f + e 1/2 W/a 2 of size \\(d' f + 
s^ 2 W/a 2 ) - (d) - e l / 2 W/a 2 )\\ 2 = 2e 1 / 2 \\W\\ 2 /a 2 . Therefore 

N(D,\D,, Q , <J) < N(A q x 0', Hi-Ill^ a 4 <5 2 /4||W||l), 

where we have defined the simplex A q — {ir G : ^' =1 7ri = 1}. We can now 
estimate the quantity on the right hand side of this expression as before, giving 



8(l + T)\\W\\ 2 2 + (c*)^ {d+1)q 
a 4 S 2 



N(V q \'D q , a ,6) < 

for S < 1 and a < c* . 

End of proof. Choose a = (i5/4||?7||2) 4 - Collecting the various estimates above, 
we find that for 8 < 1 A 4(c*) 1 / 4 (as \\U\\ 2 > \\S\\i > 1 by LemmaEE]) 

/ 4 18 (1 + r)||t/|li 6 H^||| + 4 16 ||C/||f (c*) 4 x (d+1)q 

+ 1 £18 



r eg (T V 1)V6 (iii/iij, v ||V|| a V ||W|| a )N "'*' " '"' 
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where 4 = 12(c*)-^ 12 + 2(c*) 1 / 12 + 4(c*) 4 / 18 + 8. It follows that 

< ( c*(t v i) i/6 (iigoin v ijgiiit v \mi v \m\ 2 2 ) y {d+1)q 

for all 6 < 6*, where C* and S* are constants that depend only on c*, d, and q*. 
This establishes the estimate given in the statement of the Theorem. The proof of 
the second half of the Theorem follows from Corollary 13. 131 and H-ffolU > 1- ^ 

Acknowledgment. The authors would like to thank Jean Bretagnolle for pro- 
viding an enlightening counterexample that guided some of our proofs. 
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